heavy-tailed loss
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Vietnam (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses
Moulin, Antoine, Esposito, Emmanuel, van der Hoeven, Dirk
We consider the problem setting of prediction with expert advice with possibly heavy-tailed losses, i.e.\ the only assumption on the losses is an upper bound on their second moments, denoted by $θ$. We develop adaptive algorithms that do not require any prior knowledge about the range or the second moment of the losses. Existing adaptive algorithms have what is typically considered a lower-order term in their regret guarantees. We show that this lower-order term, which is often the maximum of the losses, can actually dominate the regret bound in our setting. Specifically, we show that even with small constant $θ$, this lower-order term can scale as $\sqrt{KT}$, where $K$ is the number of experts and $T$ is the time horizon. We propose adaptive algorithms with improved regret bounds that avoid the dependence on such a lower-order term and guarantee $\mathcal{O}(\sqrt{θT\log(K)})$ regret in the worst case, and $\mathcal{O}(θ\log(KT)/Δ_{\min})$ regret when the losses are sampled i.i.d.\ from some fixed distribution, where $Δ_{\min}$ is the difference between the mean losses of the second best expert and the best expert. Additionally, when the loss function is the squared loss, our algorithm also guarantees improved regret bounds over prior results.
- North America > United States > Texas (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Fast learning rates with heavy-tailed losses
We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function \sup_{f \in \mathcal{F}} \ell \circ f, where \ell is the loss function and \mathcal{F} is the hypothesis class, exists and is L r -integrable, and (ii) \ell satisfies the multi-scale Bernstein's condition on \mathcal{F} . Under these assumptions, we prove that learning rate faster than O(n {-1/2}) can be obtained and, depending on r and the multi-scale Bernstein's powers, can be arbitrarily close to O(n {-1}) . We then verify these assumptions and derive fast learning rates for the problem of vector quantization by k -means clustering with heavy-tailed distributions. The analyses enable us to obtain novel learning rates that extend and complement existing results in the literature from both theoretical and practical viewpoints.
Reviews: Fast learning rates with heavy-tailed losses
This paper provides some new results in an important area which is receiving more and more attention: fast rates when loss functions are unbounded and heavy-tailed. Existing results based on empirical process theory often rely on bounded or sub-Gaussian loss, and the heavy tails (hence non-sub-Gaussian) case is considerably harder. The results presented seem sound and are definitely novel. They rely on results of Sara van de Geer and collaborators on concentration inequalities for unbounded empirical processes. The material is very technical and I would suggest moving even some more material to the appendix.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Vietnam (0.04)
Fast learning rates with heavy-tailed losses
Dinh, Vu C., Ho, Lam S., Nguyen, Binh, Nguyen, Duy
We study fast learning rates when the losses are not necessarily bounded and may have a distribution with heavy tails. To enable such analyses, we introduce two new conditions: (i) the envelope function $\sup_{f \in \mathcal{F}} \ell \circ f $, where $\ell$ is the loss function and $\mathcal{F}$ is the hypothesis class, exists and is $L r$-integrable, and (ii) $\ell$ satisfies the multi-scale Bernstein's condition on $\mathcal{F}$. Under these assumptions, we prove that learning rate faster than $O(n {-1/2})$ can be obtained and, depending on $r$ and the multi-scale Bernstein's powers, can be arbitrarily close to $O(n {-1})$. We then verify these assumptions and derive fast learning rates for the problem of vector quantization by $k$-means clustering with heavy-tailed distributions. The analyses enable us to obtain novel learning rates that extend and complement existing results in the literature from both theoretical and practical viewpoints.